在这项工作中,我们介绍了SOMOS数据集,这是第一个大规模的意见分数(MOS)数据集,该数据集由完全神经文本到语音(TTS)样本组成。它可以用于训练专注于现代合成器评估的自动MOS预测系统,并可以刺激声学模型评估的进步。它由LJ语音语音的20k合成话语组成,LJ语音是一个公共领域的语音数据集,是建立神经声学模型和声码器的常见基准。来自200 TTS系统(包括香草神经声学模型以及允许韵律变化的模型)产生的话语。 LPCNET VOCODER用于所有系统,因此样品的变化仅取决于声学模型。合成的话语提供了平衡,足够的域和长度覆盖范围。我们对3个英国亚马逊机械土耳其人地点进行了MOS自然评估,并共享实践,从而为这项任务提供可靠的人群注释。我们为SOMOS数据集上的最先进的MOS预测模型提供了基线结果,并显示了该模型在评估TTS话语时所面临的局限性。
translated by 谷歌翻译
In reinforcement learning (RL) research, simulations enable benchmarks between algorithms, as well as prototyping and hyper-parameter tuning of agents. In order to promote RL both in research and real-world applications, frameworks are required which are on the one hand efficient in terms of running experiments as fast as possible. On the other hand, they must be flexible enough to allow the integration of newly developed optimization techniques, e.g. new RL algorithms, which are continuously put forward by an active research community. In this paper, we introduce Karolos, a RL framework developed for robotic applications, with a particular focus on transfer scenarios with varying robot-task combinations reflected in a modular environment architecture. In addition, we provide implementations of state-of-the-art RL algorithms along with common learning-facilitating enhancements, as well as an architecture to parallelize environments across multiple processes to significantly speed up experiments. The code is open source and published on GitHub with the aim of promoting research of RL applications in robotics.
translated by 谷歌翻译